Exploring the Contextual Factors Affecting Multimodal Emotion Recognition in Videos
نویسندگان
چکیده
Emotional expressions form a key part of user behavior on today's digital platforms. While multimodal emotion recognition techniques are gaining research attention, there is lack deeper understanding how visual and non-visual features can be used to better recognize emotions in certain contexts, but not others. This study analyzes the interplay between effects derived from facial expressions, tone text conjunction with two contextual factors: 1) gender speaker, 2) duration emotional episode. Using large public dataset 2,176 manually annotated YouTube videos, we found that while consistently outperformed bimodal unimodal features, their performance varied significantly across different emotions, contexts. Multimodal performed particularly for male speakers recognizing most emotions. Furthermore, shorter than longer videos neutral happiness, sadness anger. These findings offer new insights towards development more context-aware empathetic systems.
منابع مشابه
Multimodal Emotion Recognition
Multimodal fusion is the process whereby two or more forms of input are gathered together in order to produce a higher overall classification accuracy than individual unimodal systems. This is a popular technique in emotion recognition. In this study, we attempted to discover how much we could improve upon individual unimodal systems using decision level fusion. To accomplish this, we acquired ...
متن کاملMultimodal Emotion Recognition
Speech is the primary means of communication between human beings in their day-to-day interaction with one another. Speech, if confined in meaning as the explicit verbal content of what is spoken, does not by itself carry all the information that is conveyed during a typical conversation, but is in fact nuanced and supplemented by additional modalities of information, in the form of vocalized e...
متن کاملMultimodal Emotion Recognition
Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural interaction is still missing–emotions. Emotions play an important r...
متن کاملMultimodal Emotion Recognition Using Multimodal Deep Learning
To enhance the performance of affective models and reduce the cost of acquiring physiological signals for real-world applications, we adopt multimodal deep learning approach to construct affective models from multiple physiological signals. For unimodal enhancement task, we indicate that the best recognition accuracy of 82.11% on SEED dataset is achieved with shared representations generated by...
متن کاملMEMN: Multimodal Emotional Memory Network for Emotion Recognition in Dyadic Conversational Videos
Multimodal emotion recognition is a developing field of research which aims at detecting emotions in videos. For conversational videos, current methods mostly ignore the role of inter-speaker dependency relations while classifying emotions. In this paper, we address recognizing utterance-level emotions in dyadic conversations. We propose a deep neural framework, termed Multimodal Emotional Memo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Affective Computing
سال: 2023
ISSN: ['1949-3045', '2371-9850']
DOI: https://doi.org/10.1109/taffc.2021.3071503